Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference. We present Fast Weight Layers (FWLs), a neural component that provides the benefits of dynamic evaluation much more efficiently by expressing gradient updates as linear attention. A key improvement over dynamic evaluation is that FWLs can also be applied at training time so the model learns to make good use of gradient updates. FWLs can easily be added on top of existing transformer models, require relatively little extra compute or memory to run, and significantly improve language modeling perplexity.
translated by 谷歌翻译
Language models (LMs) now excel at many tasks such as few-shot learning, question answering, reasoning, and dialog. However, they sometimes generate unsupported or misleading content. A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence. To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible. When applied to the output of several state-of-the-art LMs on a diverse set of generation tasks, we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models. Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search.
translated by 谷歌翻译
已显示通用非结构化神经网络在分布外的组成概述上挣扎。通过示例重组的组成数据增强已经转移了一些关于组成性的关于多个语义解析任务的黑盒神经模型的先前知识,但这通常需要特定于任务的工程或提供有限的收益。我们使用称为组成结构学习者(CSL)的型号提供更强大的数据重组方法。 CSL是一种具有拟同步无线语法骨干的生成模型,我们从训练数据中诱导。我们从CSL中进行重组的例子,并将其添加到预先训练的序列到序列模型(T5)的微调数据中。该程序有效地将大多数CSL的组成偏差转移到T5以进行诊断任务,并导致模型比在两个真实世界的组成泛化任务上的T5-CSL集合更强。这导致新的最先进的性能,这些挑战性的语义解析任务需要泛化自然语言变异和元素的新组成。
translated by 谷歌翻译